Prerequisites:
  • A Valid Qubrid AI Account logged in the platform
  • Enough Credits in your Account should be present to make sure the GPU Instances work when provisioning
The GPU Compute feature enables users to run AI and compute-intensive tasks using high-performance GPUs. This feature provides flexibility in choosing compute environments, interfaces, and auto-shutdown options to help optimize usage and control costs effectively
We don’t divide your instances or share it with others as per your usage. You get access to the entire GPU Virtual Machine so you can unlock full potential

Quick Deploy GPU Virtual Machines

1

Head over to the GPU Instances tab from the left menu

This will open up a card view of all available GPU Instances
2

Select the preferred GPU

Choose from the pool of GPU VMs available. We have latest NVIDIA GPUs such as B200, H200, H100, A100, L$)S, A10G, T4 & L4. The RAM, vCPU, Storage & CUDA are shown in the cards
3

Select an AI/ML Template

By default Ubuntu 22.04 is selected as a template
You can change this by clicking on the Change Template button. Once you click on this a Dialog box will come up, showing the different templates available to choose from. Simply click on the template you want to choose & it will be assigned
4

Change GPU if you want

As you have already selected the GPU, you can still change that if you want. It can change depending on your needs
5

Select the number of GPUs needed

You can select number of GPUs from 1,4 or 8 depending on your usecase
The number of GPUs might not always be available in 1 or 4. It depends on availablity. If you need something thats not there or a custom number of GPUs, contact us via Support. vCPU Core & RAM is populated automatically
6

Select Root Disk Storage

Choose from the dropdown storage options starting from 100 GB till 2 TB
Root Disk Storage is billed at 10 Cents Per GB Per month & is charged even when the instance is stopped. Example: 100 GB costs $10/month
7

Select your Interface

You can access the GPU VM either via SSH or via Jupyter. To know how to generate a SSH Key, visit our Documentations. If you want to use Jupyter, provide your Jupyter Authentication Token
8

Configure Auto Stop

This helps you automatically stop your instance after a specific amount of time as defined by you. Click on the dropdown and select the number of hours you want to set auto stop to. If you don’t want to use this feature, let it remain to the default value of Never
9

Review your Instance

The right part of the dashboard keeps updating with every option change. Review it finally once to check if the instance is as per your needs
10

Select the Commitment Period

Choose either from On-Demand Pool or commit for more discounts.
If you select any option other than On-Demand, a request will be sent to our team & someone from our team will reach out to provision the instance for you. Selecing On-Demand will give you instant access
11

Click on Launch

Launch your On-Demand GPU or put in a request for longer terms
12

Submit your Request (If Longer Commit)

Review your selection & add a note if you have any message for our Platform team.
You can also click on the Reset button in case you want to start all over again.

Choosing the Right GPU

Different workloads require different levels of performance, memory, and cost efficiency. Below is guidance on when to choose each GPU type:

NVIDIA B200 (180 GB)

  • Best for: Next-gen large-scale AI model training and high-throughput inference.
  • Why: Highest VRAM capacity with extreme bandwidth, optimized for cutting-edge foundation models and multi-trillion parameter research.

NVIDIA H200 (141 GB)

  • Best for: Large LLMs, enterprise-scale training, and memory-intensive inference.
  • Why: Higher memory than H100 with strong bandwidth, designed for advanced generative AI workloads.

NVIDIA H100 (80 GB)

  • Best for: High-performance model training, fine-tuning, and distributed workloads.
  • Why: Current industry standard for large model training; excellent tensor performance.

NVIDIA A100 (80 GB / 40 GB)

  • Best for: Training and inference at scale; versatile for research and production.
  • Why: Proven workhorse for AI/ML; available in 40 GB and 80 GB VRAM options depending on dataset/model size.

NVIDIA L40S (48 GB)

  • Best for: Balanced training, inference, and AI-enhanced graphics workloads.
  • Why: Strong GPU compute with large memory; good middle ground for enterprises running mixed AI + visualization tasks.

NVIDIA A10G (24 GB)

  • Best for: Medium-scale training, fine-tuning, and inference for open-source models.
  • Why: Cost-efficient GPU for developers and teams experimenting with models up to mid-range size.

NVIDIA T4 (16 GB)

  • Best for: Lightweight inference, prototyping, and smaller-scale AI services.
  • Why: Low-cost, energy-efficient option; good for deploying chatbots, RAG pipelines, or small LLMs.

NVIDIA L4 (24 GB)

  • Best for: Cloud inference, AI-powered video, and general-purpose ML tasks.
  • Why: Modern upgrade over T4 with more memory and stronger inference throughput.
For research & cutting-edge models: B200, H200, H100. For enterprise training/inference balance: A100, L40S. For developers & startups (cost-efficient): A10G, L4, T4

Root Disk (GB)

The Root Disk is the primary storage attached to your GPU instance. It holds the operating system, dependencies, and any data you store locally.

Key Points

  • Default Size: Each instance comes with a default root disk (e.g., 100 GB).
  • Customizable: You can increase disk size at deployment time to accommodate datasets, models, or logs.
  • Persistent Billing: Root disk storage is billed at $0.10 per GB per month and is charged even if the instance is stopped.
  • Example: A 100 GB disk costs $10 per month.

When to Increase Root Disk

  • Training large models that require big datasets stored locally.
  • Running workflows that generate heavy intermediate files or logs.
  • Deploying multiple frameworks or custom libraries on the same instance.

When to Keep It Minimal

  • Using external object storage or mounted volumes for datasets.
  • Running lightweight inference or stateless applications.
  • Optimizing costs when persistent local storage is not required.

Pausing GPU Instances

You can pause/stop GPU instances when they are not in use to reduce costs.

Key Points

  • No Compute Charges: When an instance is paused, you are not charged for GPU compute.
  • Storage Charges Continue: While paused, you will still be charged for the root disk and any attached storage.
  • Example:
    • If you pause a GPU instance with a 100 GB root disk, compute costs stop immediately.
    • Storage charges continue at $10/month until you delete the disk.
Pause instances when not running jobs to save on compute costs. Delete unused disks or move data to external object storage if long-term retention is not required.

Auto Stop

The Auto Stop feature allows you to automatically shut down GPU instances after a specified period of inactivity or based on a timer defined by the user.

Key Points

  • Automatic Shutdown: Instances will automatically stop after the configured time limit.
  • User Defined: You can set the time (e.g., 1 hour, 6 hours, 24 hours) based on your workflow.
  • Save Costs: Prevents GPU instances from running idle and accumulating unnecessary compute charges.
  • Storage Charges Remain: When an instance is auto-stopped, compute charges stop, but root disk and attached storage charges continue.

Example

  • You launch a GPU instance with a 6-hour Auto Stop setting.
  • After 6 hours, the instance automatically shuts down if still running.
  • Compute charges end immediately, but storage continues to be billed.
Always enable Auto Stop for experiments, prototyping, or jobs with predictable runtimes. Use manual control (Pause/Resume) for production workloads that need to stay online.

Frequently Asked Questions (FAQ)